Introduction

Team

  • Jonas Han (jh3877): Introduction, Executive Summary, data cleaning, GitHub folder/file structure
  • Justin Ali Kennedy (jak2294):
  • Lerong Chen (lc3375):
  • Sidney Frank Fletcher (sff2114):

Stop-and-Frisk is a NYPD practice of briefly detaining, questioning, and sometimes frisking or searching civilians for weapons and/or contraband. It was quite prevalent during then-Mayor Michael Bloomberg’s tenure as he was a strong proponent of the program. Unsuprisingly, stop-and-frisk became quite controversial during the height of the program in 2011. Many critics of the program claimed that the NYPD was using racial profiling when selecting suspects, specifically Black and Hispanic suspects. Other criticized the program’s efficacy in reducing the crime rate in NYC. Proponents of the practice argued that stop-and-frisk made the city safer, especially in neighborhoods with high Black and Hispanic populations.

Our motivation in selecting this project is to analyze the stop-and-frisk data and determine for ourselves if the NYPD was indeed using racial profiling to select suspects. Additionally, we wanted to understand the program’s effectiveness in actually deterring and/or stopping crime in NYC. Lastly, one of our team members (Jonas) was a high school student in Brooklyn around the peak of the stop-and-frisk program and witnessed many stops in the train station nearby his school.

Main Analysis

nycarrests=read.csv(file="./Raw_Data/NYPD_Arrests_Data__Historic_.csv", header=TRUE, sep=',', na.strings = c("", "NA"))
register_google(key = "AIzaSyB72pIzWJufJCUD_diZUOTDYtfkfwnyrME ") 

nyc_map <- get_map(location = c(lon = -74.00, lat = 40.71), maptype = "terrain", zoom = 11)

r <- GET('http://data.beta.nyc//dataset/0ff93d2d-90ba-457c-9f7e-39e47bf2ac5f/resource/35dd04fb-81b3-479b-a074-a27a37888ce7/download/d085e2f8d0b54d4590b1e7d1f35594c1pediacitiesnycneighborhoods.geojson')

nycarrestsls = nycarrests[, c(17,18)]

nyc_neighborhoods <- readOGR(content(r,'text'), 'OGRGeoJSON', verbose = F)
joinnbs <- tidy(nyc_neighborhoods, region="neighborhood")

nycarrestsls2=nycarrestsls
coordinates(nycarrestsls2) <- ~Longitude + Latitude
proj4string(nycarrestsls2) <- proj4string(nyc_neighborhoods)

nbmatch <- over(nycarrestsls2, nyc_neighborhoods)
llframe <- cbind(nycarrestsls, nbmatch)

llsample=llframe[sample(nrow(llframe), 50), ]


#leaflet(nyc_neighborhoods) %>%
  #addTiles() %>% 
  #addMarkers(~Longitude, ~Latitude, popup = ~neighborhood, data = llsample) %>%addProviderTiles("CartoDB.Positron")
nbpoints <- llframe %>%
  group_by(neighborhood) %>%
  summarize(Arrests=n())

joinpointsnb= left_join(joinnbs, nbpoints, by=c("id"="neighborhood"))

ggmap(nyc_map) + geom_polygon(data=joinpointsnb, aes(x=long, y=lat, group=group, fill=Arrests),alpha=0.9)+ggtitle("Arrests By Neighborhood NYC 2013-2017")+labs(y="Latitude", x = "Longitude")+scale_fill_viridis(labels=comma, name="Arrests", limits=c(0, 80000))

To make the above visualization showing the number of arrests by neighborhood in the years 2013-2017, we used the Google Maps API. The get_map function allows us to set the background mapping to cover the span of New York City. The GET command allows us to import a JSON that maps out the boundaries of all the neighborhoods in NYC. We then use a series of functions to match the longitude/latitude coordinates in the arrests dataset to their corresponding neighborhoods. The coordinates function takes the latitude/longitude coordinates and converts the dataframe into a SpatialPointsDataFrame which then can be used as input to the proj4string package functions. Using proj4string, we are able to map the longitude/latitude coordinates of the arrests dataset to the coordinate system used in the nyc neighborhood JSON. The over function is then used to find spatial intersections between the arrest coordinates and their corresponding neighborhood. Finally, to prep the data for the map creation, a series of pipelines is used to find the number of arrest coordinates per neighborhood using group_by and summarize, and joining the nyc neighborhoods dataframe with the number of points by neighborhood dataframe. gg_map is then used to create the map with gg_polygon added to fill the boundaries of each neighborhood. A scale gradient is used to show the difference between number of arrests of different neighborhoods. We chose a scale gradient because it seemed to work well for differentiating adjacent and non-adjacent neighborhoods on a continuous scale for comparison. Furthermore, we believe using this visualization type as opposed to a different medium such as a bar chart or cleveland dot plot allows the audience to quickly decipher low/high arrests neighborhoods as well as identify geographic spatial clusters between adjacent neighborhoods and understand the distrbution of arrests by borough and NYC at large that wouldnt be visible in a bar chart or dot plot.

This map visualization shows the number of arrests per neighborhood in New York City in the time period from 2013-2017, with the more brighter colors representing neighborhoods with higher densities of arrests. We can observe certain neighborhoods with relatively high numbers of arrests such as east Queens and the Canarsie, Crown Heights, and East Flatbush areas of Brooklyn. Other neighborhoods such as those in Staten Island and south Brooklyn show relatively low arrest densities in comparison. We can also notice certain similar high-arrest patterns across adjacent cities such as in the middle of Brooklyn and Upper Manhattan. One major commonality among these named high arrest neighborhoods is the relatively high Black/Hispanic populations. Though we certainly can’t attribute the difference between these high/low density arrest neighborhoods solely to race, its a component to investigate further. Other variables that could contribute to the disparities include employment rate and income.

nycsqf=read.csv(file="./Raw_Data/sqf_data_peak_years.csv", header=TRUE, sep=',', na.strings = c("", "NA"))
nycsqfls = nycsqf[, c(107,108)]
nycsqfls=nycsqfls[complete.cases(nycsqfls), ]

nycsqfls<- data.frame(sapply(nycsqfls, function(x) as.numeric(as.character(x))))

mapping <- project(nycsqfls, inverse=TRUE, proj="+proj=lcc +lat_1=41.03333333333333 +lat_2=40.66666666666666 +lat_0=40.16666666666666 +lon_0=-74 +x_0=300000.0000000001 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=ft  +to_meter=0.3048006096012192 +no_defs")

latlonSF <- data.frame(Latitude=mapping[[2]], Longitude=mapping[[1]])

latlonSF2 = latlonSF
coordinates(latlonSF2) <- ~Longitude + Latitude
proj4string(latlonSF2) <- proj4string(nyc_neighborhoods)
nbmatchSF <- over(latlonSF2, nyc_neighborhoods)
llframeSF <- cbind(latlonSF, nbmatchSF)

nbpointsSF <- llframeSF %>%
  group_by(neighborhood) %>%
  summarize(Stop_and_Frisks=n())

joinnbsSF <- tidy(nyc_neighborhoods, region="neighborhood") 

joinpointsnbSF <- left_join(joinnbsSF, nbpointsSF, by=c("id"="neighborhood"))
llsample=llframeSF[sample(nrow(llframeSF), 50), ]


leaflet(nyc_neighborhoods) %>%
  addTiles() %>% 
  addMarkers(~Longitude, ~Latitude, data = llsample) %>%addProviderTiles("CartoDB.Positron")

Main Analysis: In the above map, we plotted the coordinates (longitude/latitude) of a random subset of Stop and Frisks. Given the longitude and latitude of every recorded Stop and Frisk between 2008-2012, we used the sample function to take the random 50 point subset. To make the plot, we utilized the leaflet package to take the nyc neighborhoods mapping JSON and plot the random data points accordingly. We decided to use 50 points as it appeared to be the maximum number of points before the visual became overcrowded and tough to intrepret. Similar to the arrest region map, plotting the arrest locations on a map as opposed to a different plot allowed us to compare regions geographically in terms of potential clusters of arrests.

Executive Summary: Through this visualization we are able to see a random subset of Stop and Frisks (50) throughout NYC between 2008-2012. In this sense, we get an idea of the distribution of S&F’s throughout NYC as well as potential spatial clusters. We can notice from the map apparent clusters of arrests in Upper Manhattan and central Brooklyn. East Queens and Staten Island on the other hand have very few points marked. While by no means completely representative of the level of Stop and Frisking in NYC, these clusters potentially forecast areas to look into for more extensive analysis into the relationship between arrests and Stop and Frisks.

ggmap(nyc_map) + 
  geom_polygon(data=joinpointsnbSF, aes(x=long, y=lat, group=group, fill=Stop_and_Frisks), alpha=0.9)+ggtitle("Stop and Frisks By Neighborhood NYC 2008-2012 Peak Years")+labs(y="Latitude", x = "Longitude")+scale_fill_viridis(labels=comma, limits=c(0, 150000), name="Stop and Frisks")

For the Stop and Frisk neighborhood fill plot, the process was the same as for the arrests neighborhood fill plot. Here the the coordinates of all Stop and Frisks in the period 2008-2012 were taken as inputs. However, one difficulty with the S&F data is that the coordinates are in terms of x,y values. To convert these coordinates to latitude/longitude,we used the project function with its inputs the x,y coordinate bounds we were mapping from and the longitude/latitude bounds we were looking to map to as well as the coordinate system features of the x,y coordinates. The NYC gov website publishes the parameters they use for all their coordinate datasets in an online document titled ‘Citywide Guidelines for Geographic Information Systems’. For example, for x,y geosptial coordinates they adhere to the NAD83 datum and GRS80 ellipsoid basis. The project function uses this data to complete the conversion. After taking the resulting longitudes/latitudes of all the recorded Stop and Frisks, the pipeline is then the same as for the arrests maps in converting this information into a neighborhood fill graph.

Another issue that came up for the Stop and Frisk data is that the available historical data files were too big to be imported into R. As a result, we used the ‘peak years’ dataset from 2008-2012. In this sense we’re making inferences between the Stop and Frisk and Arrest data (historical data only covering 2013-2017) on the premise that the number of arrests by neighborhood hasnt changed drastically in the last 6-8 years (Shown in Interactive Component).

From the 2008-2012 Stop and Frisk visualization above, we are able to see the number of stop and frisks per neighborhood in New York City during the peak years of New York’s Stop and Frisk initiative. From the image, we can see certain neighborhoods of NYC having significantly more S&F’s compared to others. In particular, Roosevelt Island and Fort Wadsworth, with 16 and 55 S&F’s respectively, have comparatively low S&F counts while neighborhoods like Bedford-Stuyvesant and East Harlem (137770 and 115096 S&F’s respectively) have comparatively large counts. The map visualization also shows spatial clustering among neighborhoods with low and high counts of S&F’s. For example, Upper Manhattan and Brooklyn appear to have multiple neighborhoods in the same vicinity with high levels of S&F’s while Lower Manhattan and Staten Island neighborhoods appear to have uniformly low levels. Obviously, there are multiple variables to consider that these patterns could be attributed to.

Additionally, we can compare this visualization to the Arrests by Neighborhood map and notice some clear similarities. Namely, despite the maps differing in time frame, one can notice the number of arrests per neighborhood correlating heavily with the number of S&F’s per neighborhood. Intuitively, this agrees with what one would expect. What’s perhaps more interesting is the divergence between arrests and S&F’s of certain neighborhoods; for example Midtown/Lower Manhattan and Rochdale, Springfield Gardens, St. Albans neighborhoods of Queens.

We look to investigate an additional component of this relationship between number of arrests and number of S&F’s through the following interactive component.

Executive Summary

From the frequency bar plot below, we observe that the practice of stop-and-frisk increased significantly during then-Mayor Michael Bloomberg’s tenure in office, which began in 2002 and ended in 2013. The practice peaked in 2011, with over 650,000 stops, before being severely halted in 2013 and 2014. The subsequent sharp decline in stop-and-frisk was led by a federal lawsuit against the practice in 2013 as well as Bill de Blasio’s election to the mayor’s office in 2014. Bill de Blasio was an avid opponent of stop-and-frisk and ran a campaign promising to reform the practice. Since taking office in 2014, stop-and-frisk has continued to decrease every single year.

sqf_by_year <- read.csv(file="./Data/sqf_by_year.csv", header=TRUE)
ggplot(data=sqf_by_year, aes(x=year, y=count/1000)) + geom_bar(stat="identity") + ggtitle("NYPD Stop-and-Frisk (2003-2017)") + theme_grey(18) + theme(plot.title=element_text(hjust=0.5)) + scale_x_continuous("Year", labels=as.character(sqf_by_year$year), breaks=sqf_by_year$year) + ylab("Frequency (Thousands)")

For an preliminary analysis, we compared the race of stop-and-frisk suspects against the demographics of NYC from the 2010 Census. From this initial perspective, it appears that there may be an issue of racial profiling in regards to the stop-and-frisk program. From 2003 to 2017, over 50% of stop-and-frisk suspects were Black while less than 25% of NYC is Black. Furthermore, White suspects accounted for less than 10% of stops, but represent the majority(>30%) of NYC residents. A similar disparity exists for Asians across the two datasets. Note that it was difficult to compare the Hispanic populations across the two different data sources as the stop-and-frisk dataset buckets all Hispanics into White Hispanics or Black Hispanics. In the 2010 Census data, most Hispanics responded in the “Some Other Race alone” category and not in the “White alone” or “Black or African-American alone” categories.

arrests_by_race <- read.csv(file="./Data/arrests_by_race.csv", header=TRUE)
census_by_race <- read.csv(file="./Data/census_by_race.csv", header=TRUE)
complaints_by_susp_race <- read.csv(file="./Data/complaints_by_susp_race.csv", header=TRUE)
shootings_by_perp_race <- read.csv(file="./Data/shootings_by_perp_race.csv", header=TRUE)
sqf_by_race <- read.csv(file="./Data/sqf_by_race.csv", header=TRUE)

race_data <- do.call("rbind", list(arrests_by_race, complaints_by_susp_race, shootings_by_perp_race, sqf_by_race))
race_data$dataset <- factor(race_data$dataset, levels=c("Stop-and-Frisk", "Complaints", "Arrests", "Shootings"))

par(mfrow=c(1,2))
par(mar = c(11,4,3,1) + 0.1)

barplot(sqf_by_race$percentage, names.arg=sqf_by_race$race, main="Stop-and-Frisk by Suspect Race", ylim=c(0, .55), las=2, ylab="Percentage", yaxt="n")
axis(2, at=pretty(sqf_by_race$percentage), lab=paste0(pretty(sqf_by_race$percentage) * 100, "%"), las=2)

barplot(census_by_race$percentage, names.arg=census_by_race$race, main="NYC 2010 Census by Race", ylim=c(0, .55), las=2, ylab="Percentage", yaxt="n")
axis(2, at=pretty(sqf_by_race$percentage), lab=paste0(pretty(sqf_by_race$percentage) * 100, "%"), las=2)

In response to the racial profiling controversy, then-Mayor Michael Bloomberg stated that “If we stopped people based on census numbers, we would stop many fewer criminals, recover many fewer weapons and allow many more violent crimes to take place”. He has also further stated in an op-ed for The Washington Post that “Ninety percent of all people killed in our city — and 90 percent of all those who commit the murders and other violent crimes — are black and Hispanic”. We were interested to find out for ourselves if the former mayor’s statements were true. Thus, we looked at the suspect/perpetrator race variable across the stop-and-frisk, complaints, arrests, and shooting incidents datasets to answer the question. From the plot below, we observe that the racial composition of the suspects/perpetrators in stop-and-frisk, complaints, and arrests datasets were all fairly similar. That is, it appears that Blacks are stopped disproportionately compared to the percentage of Blacks in NYC, but not compared to the percentage of complaints filed against Black suspects or arrests of Black perpetrators. Shooting incidents in NYC skew even more heavily towards Black perpetrators. This supoprts Michael Bloomberg’s claim that “the proportion of stops generally reflects our crime numbers”. There are two possible explanations for this. The first is that society is prejudiced against Blacks and thus are more likely to file complaints against Blacks or claim that a Black suspect committed a crime. Another possibility is that Blacks commit crime at disproportionate rates compared to other races.

cbbPalette <- c("#000000", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")

race_bar_chart <- ggplot(race_data, aes(x=race, y=percentage, fill=race)) + geom_bar(stat="identity") + facet_grid(.~dataset) + theme(axis.text.x = element_text(angle = 60, hjust=1)) + xlab("Suspect/Perpetrator Race") + ylab("Percentage") + guides(fill=guide_legend(title="")) + scale_fill_manual(values=cbbPalette) + scale_y_continuous(labels = scales::percent_format(accuracy = 1))
race_bar_chart

Next, we compared the suspect/perpetrator race against the victim race for the complaints and shooting incidents datasets. We wanted to discover how closely the victim race distributions aligned with the suspect/perpetrator race distributions. Across all four bar plots below, Blacks are overrepresented compared to the NYC Census data. That is, Blacks disproportionantely commit more crime, but are also more likely to be victims of crime. It is also interesting to note that complaint victims skew less towards Blacks and more towards Whites and Asians compared to complaint suspects. This is in contrast to shooting incidents, where the distribution of perpetrator race is nearly identical to the distribution of victim race. This may indicate that shootings typically involve perpetrators and victims of the same race. This is a topic that we would have liked to researched further, given more time.

complaints_by_vic_race <- read.csv(file="./Data/complaints_by_vic_race.csv", header=TRUE)
complaints_by_vic_race$susp_vic_category <- "Complaints | Victim"
shootings_by_vic_race <- read.csv(file="./Data/shootings_by_vic_race.csv", header=TRUE)
shootings_by_vic_race$susp_vic_category <- "Shootings | Victim"

complaints_by_susp_race$susp_vic_category <- "Complaints | Suspect"
shootings_by_perp_race$susp_vic_category <- "Shootings | Perpetrator"

susp_vic_by_race <- do.call("rbind", list(complaints_by_vic_race, complaints_by_susp_race, shootings_by_perp_race, shootings_by_vic_race))
susp_vic_by_race$susp_vic_category <- factor(susp_vic_by_race$susp_vic_category, levels=c("Complaints | Suspect", "Complaints | Victim", "Shootings | Perpetrator", "Shootings | Victim"))


cbbPalette <- c("#000000", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")

susp_vic_bar_chart <- ggplot(susp_vic_by_race, aes(x=race, y=percentage, fill=race)) + geom_bar(stat="identity") + facet_grid(.~susp_vic_category) + theme(axis.text.x = element_text(angle = 60, hjust=1)) + xlab("") + ylab("Percentage") + guides(fill=guide_legend(title="")) + scale_fill_manual(values=cbbPalette) + scale_y_continuous(labels = scales::percent_format(accuracy = 1))
susp_vic_bar_chart

The vast majority of stops(approx. 88%) involved innocent suspects and only 12% of stops led to an arrest or summons. Furthermore, only about 3% of stops found a weapon or contraband. It is difficult for us to judge whether or not these catch rates are effective as we do not have another metric to compare it to. Critics of stop-and-frisk may argue that this may be a wasteful use of police resources since the vast majority of stops involve innocent civilians. However, proponents of the practice may argue that over the course of the last 15 years (2003-2017), an extra 50,000 weapons have been taken off the streets, 100,000 contraband were confiscated, and 300,000 criminals were arrested.

not_innocent_df <- read.csv(file="./Data/sqf_not_innocent.csv", header=TRUE)

ggplot(not_innocent_df, aes(x=reorder(Category, Count), y=Count/1000)) + geom_bar(stat="identity") + theme_grey(16) + coord_flip() + ylab("Frequency(Thousands)") + xlab("") + theme(plot.margin = margin(10, 15, 10, 1)) + ggtitle("Stop-and-Frisk Efficacy")

ggplot(not_innocent_df, aes(x=reorder(Category, Percentage_of_Stops), y=Percentage_of_Stops)) + geom_bar(stat="identity") + theme_grey(16) + coord_flip() + ylab("Percentage of Stops") + xlab("") + scale_y_continuous(labels = scales::percent) + theme(plot.margin = margin(10, 15, 10, 1)) + ggtitle("Stop-and-Frisk Efficacy")

We will next analyze the effect that stop-and-frisk had on crime levels in NYC. Specifically, we will compare the levels of stop-and-frisk against the levels of crime for both misdemeanors and felonies from 2003 to 2017. The NYC crime statistics page classifies felonies into two categories: “Citywide Seven Major Felony Offenses” and “Citywide Non-Seven Major Felony Offenses”. The seven major felonies inlcude crimes such as murder, rape, robbery, felony assault, burglary, grand larceny, and grand larceny of a motor vehicle. The non-seven major felonies include arson, drug possession, weapons possession, fraud, forgery, etc. Since the four categories of data (stop-and-frisk, major felonies, non-major felonies, misdemeanors) that we wanted to plot together have vastly different scales, we indexed all the data points according to their respective starting values in 2003. As evinced from the line plot below, all types of crime have steadily decreased from 2003 to 2017. Stop-and-frisk, on the other hand, increased quickly throughout the mid to late 2000s and peaked in 2011. We observe that during this same time period, non-major felonies and misdemeanors have remained steady while major felonies did experience a noticeable decline. One could argue that this could have been due to the increased instances of stop-and-frisk throughout NYC. However, after the election of Bill de Blasio and the subsequent overhaul of the stop-and-frisk program, we notice that crimes did not increase at all. In fact, crime for all three categories (major felonies, non-major felonies, misdemeanors) have continued to decline. Moreover, 2017 has seen the lowest level of crime in all the years we analyzed while simultaneously also having the least amount of stop-and-frisks. This indicates that stop-and-frisk may not have had much of an impact on crime rate in NYC as otherwise suggested by Michael Bloomberg.

# https://www1.nyc.gov/site/nypd/stats/crime-statistics/historical.page
sqf_vs_crime_by_year <- sqf_by_year
colnames(sqf_vs_crime_by_year) <- c("year", "sqf_count")
major_felonies_by_year <- c(147069, 142093, 135475, 128682, 121009, 117956, 106730, 105115, 106669, 111147, 111335, 106722, 105453, 101716, 96658)
non_major_felonies_by_year <- c(61217, 66733, 67854, 69028, 70958, 68958, 63760, 59387, 57240, 56902, 57650, 56869, 56520, 58346, 54907)
misdemeanors_by_year <- c(365471, 357724, 353649, 361574, 378616, 380406, 386765, 391892, 383108, 374364, 359350, 348371, 322848, 314925, 300354)

sqf_vs_crime_by_year$major_felonies <- major_felonies_by_year
sqf_vs_crime_by_year$nonmajor_felonies <- non_major_felonies_by_year
sqf_vs_crime_by_year$misdemeanors <- misdemeanors_by_year

starting_values_df <- sqf_vs_crime_by_year[sqf_vs_crime_by_year$year == 2003,]
sqf_vs_crime_by_year["Stop-and-Frisk"] <- (sqf_vs_crime_by_year$sqf_count/starting_values_df$sqf_count) * 100
sqf_vs_crime_by_year["Major Felonies"] <- (sqf_vs_crime_by_year$major_felonies/starting_values_df$major_felonies) * 100
sqf_vs_crime_by_year["Non-Major Felonies"] <- (sqf_vs_crime_by_year$nonmajor_felonies/starting_values_df$nonmajor_felonies) * 100
sqf_vs_crime_by_year["Misdemeanors"] <- (sqf_vs_crime_by_year$misdemeanors/starting_values_df$misdemeanors) * 100

sqf_crime_indexed_by_year <- melt(sqf_vs_crime_by_year, id.vars = "year", measure.vars = c("Stop-and-Frisk", "Major Felonies", "Non-Major Felonies", "Misdemeanors"))

colnames(sqf_crime_indexed_by_year)[2] <- "Category"

cbbPalette <- c("#000000", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")

ggplot() + geom_line(data=sqf_crime_indexed_by_year, aes(x=year, y=value, color=Category), lwd=1.5) + xlab("Year") + ylab("Index") + ggtitle("Stop-and-Frisk vs. Crime") + scale_color_manual(values=cbbPalette) + theme_grey(16) + theme(plot.title=element_text(hjust=0.5)) + scale_x_continuous("", labels = as.character(sqf_crime_indexed_by_year$year), breaks = sqf_crime_indexed_by_year$year)

sqf_race_agg_by_year <- read.csv("./Data/sqf_race_by_year.csv", header=TRUE)

ggplot() + geom_line(data=sqf_race_agg_by_year, aes(x=year, y=percentage, color=race), lwd=1.5) + ylab("Percentage") + ggtitle("Stops by Race Across Years") + guides(fill=guide_legend(title="")) + scale_color_brewer(palette="Set1") + theme_grey(16) + theme(plot.title=element_text(hjust=0.5)) + scale_x_continuous("", labels = as.character(sqf_race_agg_by_year$year), breaks = sqf_race_agg_by_year$year) + scale_y_continuous(labels = scales::percent_format(accuracy = 1))

Interactive Component

Finally, we added an interactive component built in D3 which can be seen at (https://jonashan90.github.io/nyc_crime_project/)[https://jonashan90.github.io/nyc_crime_project/]. This analysis complements the graphs above by allowing the viewer to subset the stop and frisk data along the time axis, the geographic location and look at the racial breakdown by police precincts (originally we wanted to plot the data by latitude/longitude but the lack of full data for this prevented us). Several trends can be derived from this data. We can see that the stop and frisk program began in east Brooklyn and stayed concentrated there - precincts 73 & 75 had over 10,000 stops in the first full reporting year (2004) and remained at those levels through 2012. We can see looking at the analysis that the vast majority of the suspects in those stops in precinct 73 were black followed by white-Hispanic. (interesting, there appear to be relatively few people identified as black-Hispanic in the dataset). By 2005 the SQF program had begun to move into the Bronx and Harlem, with precinct 40 in South Bronx having about 13000 stops. This precinct showed more Hispanics relative to the number of African-Americans than the precincts in Brooklyn. By 2006 SQF practices had reached Staten Island and East New York. Lower and middle Manhattan remain relatively unscathed throughout the program except for the Times Square district. This may be related to the broken windows policing policy or simply to extra policing in tourist districts or both. In the description for the interactive component, we briefly touch on the story of the lead plaintiff for the class-action suit against stop and frisk (Floyd v City of New York) and note that David Floyd can be counted as a statistic from precinct 43 in the Bronx in 2008. We can see that similar to our previous analysis for a Bronx district his district’s suspects were majority-black (as he was) with about 2/3rds Hispanic. Near La Guardia Airport (precincts 109, 115, 110) we can see our first “purple districts” (greater than 10000 stops) with majority-Hispanic stops. Interestingly, precinct 114 just adjacent is majority-black (this is the precinct that includes Rikers Island, although hopefully there weren’t many stop and frisks happening there). By 2011, we can see that the program has peaked with more than 30 purple districts in the city, as noted above. As also noted above, there is not a corresponding peak in felonies, major and non-major or misdemeoners. It’s an interesting and underexplored part of this analysis as to why the peak occurred. Critics alleged that quotas as driven by the Compustat program may have exacerbated stops made under the stop and frisk program. It would be interesting to know exactly which statistics are tracked under Compustat and see if there was any explanatory factors for the policing spike here. The drop in rates in 2012 was also an interesting data artifact. The court ruling and election of Bill de Blasio didn’t happen until 2013, however public backlash to the program had reached high levels by then. The broken windows theory of policing is supposedly about community policing and it would be interesting to know if the level of community activism or outreach affected the amount of stop and frisks on a per-precinct level. By 2013, much of the map is green with only our old friends precincts 73 & 75 still with elevated levels of arrest. As a interactive data visualization tool, this map is fairly limited in what it allows. One thought we had was to unite this dataset with the arrests dataset; however the lack of historic data for arrests before 2013 made this impractical (the program was practically over by then). Interesting points we left out were whether or not a frisk occured whether or not an arrest occurred. As mentioned in the opinion, at least one analysis (the Fagan report) went through the entire list of stops and attempted to classify them into “apparently justified”, “apparently unjustified” or “ungeneralizable” based on the UF-250 form that officers were required to fill out after. It would be interesting to incorporate the results of this analysis into the map.

Conclusion